Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
translated by 谷歌翻译
通过利用和适应到目前为止获得的知识,人类具有识别和区分他们不熟悉的实例的天生能力。重要的是,他们实现了这一目标,而不会在早期学习中恶化表现。受此启发,我们识别并制定了NCDWF的新的,务实的问题设置:新颖的类发现而无需忘记,哪个任务是机器学习模型从未标记的数据中逐步发现实例的新颖类别,同时在先前看到的类别上保持其性能。我们提出1)一种生成伪内表示的方法,该表示的代理(不再可用)标记的数据,从而减轻遗忘的遗忘,2)基于相互信息的正常化程序,可以增强对新型类别的无聊发现,而3)a 3)当测试数据包含所见类别和看不见的类别的实例时,简单的已知类标识符可以有助于广义推断。我们介绍了基于CIFAR-10,CIFAR-100和IMAGENET-1000的实验协议,以衡量知识保留和新型类发现之间的权衡。我们广泛的评估表明,现有的模型在确定新类别的同时灾难性地忘记了先前看到的类别,而我们的方法能够有效地在竞争目标之间平衡。我们希望我们的工作能够吸引对这个新确定的实用问题设定的进一步研究。
translated by 谷歌翻译
本文考虑了层次多标签分类(HMC)的问题,其中(i)每个示例都可以存在几个标签,并且(ii)标签通过特定于域的层次结构相关。在直觉的指导下,所有错误都不相等,我们提出了全面的层次结构意识到多标签预测(Champ),该框架会根据其严重性根据层次结构树惩罚错误预测。据我们所知,有一些作品将这种想法应用于单标签分类,但对于多标签分类,有限的作品侧重于错误的严重性。关键原因是没有明确的方法可以在多标签设置中量化错误预测的严重性。在这项工作中,我们提出了一个简单但有效的指标,以量化HMC中错误的严重性,自然会导致冠军。在跨模态六个公共HMC数据集(图像,音频和文本)上进行的广泛实验表明,纳入层次信息会带来可观的增长,因为Champ提高了AUPRC(2.6%的中位数改善)和层次指标(2.85%的中位数提高百分比)(超过2.85%)独立分层或多标签分类方法。与标准的多标记基线相比,Champ在鲁棒性(平均提高百分比8.87%)和数据制度更少的稳健性(8.87%)方面提供了改进的AUPRC。此外,我们的方法提供了一个框架来增强具有更好错误的现有多标签分类算法(平均百分比增量为18.1%)。
translated by 谷歌翻译
Current supervised visual detectors, though impressive within their training distribution, often fail to segment out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses can be insufficient for instance segmentation tasks, without also considering architectural inductive biases. For image segmentation, recent slot-centric generative models break such dependence on supervision by attempting to segment scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised instance segmentation model equipped with a slot-centric inductive bias, that is adapted per scene at test time through gradient descent on reconstruction or novel view synthesis objectives. We show that test-time adaptation in Slot-TTA greatly improves instance segmentation in out-of-distribution scenes. We evaluate Slot-TTA in several 3D and 2D scene instance segmentation benchmarks and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors and self-supervised test-time adaptation methods.
translated by 谷歌翻译
我们考虑了源模型的无监督域适应的新问题,而无需访问语义分段的源数据。无监督的域适配旨在使标记为源数据的模型调整到新的未标记目标数据集。现有方法假设源数据在自适应期间与目标数据一起使用。但是,在实际情况下,由于在本工作中的原因,我们只能访问源模型和未标记的目标数据,但不是标记的来源,我们提出了一种自我训练方法从源模型中提取知识。要弥补从源到目标的分发班次,我们首先使用未标记的目标数据更新网络的标准化参数。然后我们采用信心过滤的伪标签,并强制执行某些转换。尽管非常简单直观,但我们的框架能够在我们广泛的实验和消融研究中直接应用于目标数据的源模型来实现显着的性能。事实上,性能只是几个远离最近的最先进的方法,它使用源数据进行适应。我们进一步展示了完全测试时间适应设置的所提出方法的恒定性,在那里我们不需要任何目标培训数据并仅在测试时适应。
translated by 谷歌翻译
在测试时间适应(TTA)中,给定在某些源数据上培训的模型,目标是使其适应从不同分布的测试实例更好地预测。至关重要的是,TTA假设从目标分布到Finetune源模型,无法访问源数据或甚至从目标分布到任何其他标记/未标记的样本。在这项工作中,我们考虑TTA在更务实的设置中,我们称为SITA(单图像测试时间适应)。这里,在制作每个预测时,该模型只能访问给定的\ emph {单}测试实例,而不是实例的\ emph {批次}。通常在文献中被考虑。这是由逼真的情况激励,其中在按需时尚中需要推断,可能不会被延迟到“批量 - iFY”传入请求或者在没有范围的边缘设备(如移动电话中)发生推断批处理。 SITA的整个适应过程应在推理时间发生时非常快。为了解决这个问题,我们提出了一种新颖的AUGBN,用于仅需要转发传播的SITA设置。该方法可以为分类和分段任务的单个测试实例调整任何特征训练模型。 AUGBN估计仅使用具有标签保存的转换的一个前进通过的给定测试图像的看不见的测试分布的正常化统计。由于AUGBN不涉及任何反向传播,与其他最近的方法相比,它显着更快。据我们所知,这是仅使用单个测试图像解决此硬调整问题的第一个工作。尽管非常简单,但我们的框架能够在我们广泛的实验和消融研究中对目标实例上应用源模型来实现显着的性能增益。
translated by 谷歌翻译
图形神经网络(GNNS)是一种用于建模图形结构化数据的流行技术,该数据通过来自每个节点的本地邻域的信息聚合来计算节点级表示的结构。然而,该聚合意味着增加敏感信息的风险,因为节点可以参与多个节点的推断。这意味着标准隐私保存机器学习技术,例如差异私有随机梯度下降(DP-SGD) - 这被设计用于每个数据点仅参与推理的一个点的情况 - 要么不适用,或导致不准确解决方案。在这项工作中,我们正式定义了使用节点级别隐私学习1层GNN的问题,并提供具有强大差异隐私保证的算法解决方案。即使每个节点都可以参与多个节点的推断,通过采用仔细的敏感性分析和逐个放大技术的非琐碎扩展,我们的方法能够提供具有实心隐私参数的准确解决方案。标准基准测试的实证评估表明,我们的方法确实能够学习准确的隐私保留GNN,同时仍然优于完全忽略图形信息的标准非私有方法。
translated by 谷歌翻译
This paper presents a comprehensive survey of low-light image and video enhancement. We begin with the challenging mixed over-/under-exposed images, which are under-performed by existing methods. To this end, we propose two variants of the SICE dataset named SICE_Grad and SICE_Mix. Next, we introduce Night Wenzhou, a large-scale, high-resolution video dataset, to address the issue of the lack of a low-light video dataset that discount the use of low-light image enhancement (LLIE) to videos. The Night Wenzhou dataset is challenging since it consists of fast-moving aerial scenes and streetscapes with varying illuminations and degradation. We conduct extensive key technique analysis and experimental comparisons for representative LLIE approaches using these newly proposed datasets and the current benchmark datasets. Finally, we address unresolved issues and propose future research topics for the LLIE community.
translated by 谷歌翻译
We investigate data-driven texture modeling via analysis and synthesis with generative adversarial networks. For network training and testing, we have compiled a diverse set of spatially homogeneous textures, ranging from stochastic to regular. We adopt StyleGAN3 for synthesis and demonstrate that it produces diverse textures beyond those represented in the training data. For texture analysis, we propose GAN inversion using a novel latent domain reconstruction consistency criterion for synthesized textures, and iterative refinement with Gramian loss for real textures. We propose perceptual procedures for evaluating network capabilities, exploring the global and local behavior of latent space trajectories, and comparing with existing texture analysis-synthesis techniques.
translated by 谷歌翻译
Recent advances in deep learning research, such as transformers, have bolstered the ability for automated agents to generate creative texts similar to those that a human would write. By default, transformer decoders can only generate new text with respect to previously generated text. The output distribution of candidate tokens at any position is conditioned on previously selected tokens using a self-attention mechanism to emulate the property of autoregression. This is inherently limiting for tasks such as controllable story generation where it may be necessary to condition on future plot events when writing a story. In this work, we propose Future Sight, a method for finetuning a pretrained generative transformer on the task of future conditioning. Transformer decoders are typically pretrained on the task of completing a context, one token at a time, by means of self-attention. Future Sight additionally enables a decoder to attend to an encoded future plot event. This motivates the decoder to expand on the context in a way that logically concludes with the provided future. During inference, the future plot event can be written by a human author to steer the narrative being generated in a certain direction. We evaluate the efficacy of our approach on a story generation task with human evaluators.
translated by 谷歌翻译